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Abstract 

This paper introduces a directional multiscale algorithm for the two dimensional N- 
QQ body problem of the Helmholtz kernel with applications to high frequency scattering. 

Cn| The algorithm follows the approach in [Engquist and Ying, SIAM Journal on Scientific 

Computing, 29 (4), 2007] where the three dimensional case was studied. The main 
observation is that, for two regions that follow a directional parabolic geometric config- 
uration, the interaction between the points in these two regions through the Helmholtz 
kernel is approximately low rank. We propose an improved randomized procedure for 
r*^ generating the low rank representations. Based on these representations, we organize 

"j^ the computation of the far field interaction in a multidirectional and multiscale way to 

achieve maximum efficiency. The proposed algorithm is accurate and has the optimal 

1 ^ I 0{N log N) complexity for problems from two dimensional scattering applications. We 

present numerical results for several test examples to illustrate the algorithm and its 
application to two dimensional high frequency scattering problems. 

*^ Keywords. A^-body problems; Helmholtz equation; Oscillatory kernels; Fast multipole 

7—1 methods; Multidirectional computation; Multiscale methods. 
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00 1 Introduction 

O 

> 1.1 Problem statement 

• ^ 

^ In this paper, we consider the two dimensional A^-body problem for the high frequency 

Helmholtz kernel. Let {/j, 1 < i < N} be a set of charges located at points {pi, 1 < « < N} 
in M^. We assume that the points {pi} belong to a square centered at the origin with size 
K. The problem is to evaluate the potentials {ui, 1 < i < N} defined by 

TV 

Ui = ^G{pi,pj) ■ fj (1) 
i=i 

where G{x, y) = ^H^\27r\x — y\) is the fundamental solution of the 2D Helmholtz equation. 
In this paper, we use i to denote \/— T. 

This computational problem mostly arises from the numerical solution of 2D time har- 
monic scattering problems |9j . For example, suppose that Z) C is a compact object with 
a smooth boundary and u^""^ is the incoming field. If D represents a sound soft scatterer. 
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the scattering field u satisfies the following Helmholtz equation with the Dirichlet boundary 
condition: 

-An - {2TTfu = mR'^\D 

u{x) = -u^'^'^{x) for X G dD 

fdu \ 

lim r — 2t:iu = 

r— >oo ydr J 

where the wave number is set to be 2tt. The last condition is the Sommerfeld radiation 
condition and guarantees that the scattering field u propagates to infinity. One highly 
efficient way to solve this problem is to reformulate it into an equivalent boundary integral 
equation (BIE) 

+ i^^^^ - '^G{x, y)) <Piy)dy = -n-^(x) (2) 

where n(y) is the exterior normal of dD at y, rj is some fixed constant, and <j){x) for 
x G dD is the unknown charge distribution on the boundary dD. Once (j) is solved from 
the scattering field u can be simply computed with an integral formula [9]. The BIE 
approach has the advantage of reducing the number of unknowns. The discrete version 
of however, is a dense linear system which usually requires an iterative algorithm like 
GMRES [28 for its solution. At each step of the iterative solver, we then need to evaluate 
the computational problem in ([T]), with {pi} being the appropriate quadrature points. 

It is well known that the complexity of a scattering problem often scales with the size of 
scatterer in terms of the wavelength. Since the wavelength is taken to be 1 in our setup, the 
complexity of ([T]) depends on the number K, which can be of order 10^ for a typical large 
scale scattering problem. Since one often uses a constant number of points per wavelength 
when discretizing ([2]), the number of points is proportional to K. 

1.2 Previous work 

Direct computation of ([T| takes 0{N^) operations. This can be quite time consuming when 
N is large. Various fast algorithms have been proposed to reduce this complexity in the 
past two decades. Among them, the most popular approach is the high frequency fast 
multipole method (HF-FMM) developed by Rokhhn et al. [27] . In the HF-FMM, the 
whole computational domain is partitioned into a quadtree and one associates with each 
square of the quadtree two expansions: the far field expansion and the local field expansion 
[7] . These expansions allow one to accelerate the computation in the low frequency region. 
In the high frequency region, the Fourier transforms of these expansions are used instead to 
achieve optimal efficiency since the translations between them become diagonal operators 
under the Fourier basis. The HF-FMM has an optimal O(A^logA) complexity and has 
been widely used. 

A different approach is to discrete the integral equation ^ under the Galerkin frame- 
work using local Fourier bases or wavelet packets. The stiffness matrix becomes approxi- 
mately sparse under these bases since most of the entries are close to zero and can be safely 
discarded. Early algorithms [21 HI |6l [TTl [121 [E] of this approach focus on finding the correct 
one dimensional basis, while a recent development |20j considers the use of two dimensional 
wave packets which can offer more flexibility and better compression rate. 

Another early development is the multilevel matrix decomposition by Michielssen and 
Boag [25 . The three stage multiplication algorithm, which is later named the butterfly 
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Figure 1: Two sets Y and X that satisfy the directional parabohc separation condition. 



algorithm by j26| . is quite similar to the FFT and brings the overall complexity down to 
0{Nlog^ N). 

In [15], we proposed an algorithm for the three dimensional A^-body problem of the 
high frequency Helmholtz kernel. It relies on a low rank property of the 3D Helmholtz 
kernel for certain geometric configurations. The algorithm organizes the computation in a 
multidirectional and multilevel fashion and has an optimal O(A'^logA^) complexity. 



1.3 A multidirectional approach 

In this paper, we adapt the approach in [15j to the two dimensional A^-body problem of 
the Helmholtz kernel. The main idea is a similar low rank property of the 2D Helmholtz 
kernel. We say that two sets Y and X satisfy the directional parabolic separation condition 
if y is a disk of radius r and X is the set of points that belong to a cone with spanning 
angle 1/r and are at least away from Y (see Figure [l]) . 

Once Y and X satisfy the directional parabolic separation condition, one can show that 
for any fixed accuracy the interaction between X and Y via the Helmholtz kernel G{x, y) 
is approximately of low rank and the rank is independent of r. More precisely, for any 
accuracy e, there exist a constant T(e) and two sets of functions {ai{x), 1 < i < T(e)} and 
{(3i{y), 1 < i < T{e)} such that for any x G X and y G Y 



T(e) 

G{x,y) - '^ai{x)Pi{y) 

i=l 



< £ 



(see Theorem 2.2). Notice that {Qj(a;)} and {/3i(y)} are only functions of x and y respec- 
tively. We call such an approximation a directional separated representation. One major 
component of our approach is to use these representations to build equivalent charges for 
well-separated interaction. 

Similar to the 3D algorithm in [15] , our 2D algorithm starts by generating a quadtree for 
the whole computational domain. In the low frequency region where the squares are of size 
less than 1, the interactions are accelerated using the kernel independent FMM algorithm 
in [30]. In the high frequency region where the squares are of size greater than or equal 
to 1, the far field of each square is partitioned into wedges which follow the directional 
parabolic separation condition (see Figure |2]). Between the square and each of its wedges, 
the computation is accelerated via the directional separated representation associated with 
the wedge. 

Apart from extending the multidirectional algorithm of [15] to the 2D Helmholtz kernel, 
this paper also contains two new contributions: 
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Figure 2: Left: the quadtree constructed for a point distribution supported on a curve. 
Right: for each square B in the high frequency region, its far field is partitioned into 
multiple wedges. We construct a low rank representation of the interaction between B 
and each of its wedges. This representation is further used to accelerate the computation 
between B and all the squares in the wedge. 



• We provide an improved randomized procedure for the construction of the directional 
separated representations. The new procedure is more efficient and generates repre- 
sentations with smaller ranks. 

• Our algorithm has been applied to the solution of This allows us to study large 
scatterers that are thousands of wavelengths wide. 

The rest of this paper is organized as follows. In Section [2] we briefly summarize the 
theoretical result on which our approach is based and describe the new improved procedure 
for constructing the separated representations. After describing our algorithm for ([T]) in 
detail in Section [3] we present in Section [4] the numerical results for several test examples. 
Finally, Section [5] provides some comments on future research directions. Though this 
paper focuses on the two dimensional Helmholtz kernel, we would like to point out that our 
algorithm is also applicable to other 2D oscillatory kernels such as e^'^*l^~^L 



2 Directional Separated Representations 



Definition 2.1. Let f{x,y) be a function for x £ X and y £ Y. We say f{x,y) has a 
T-term e-expansion for X and Y if there exist functions {aj(x), 1 < i < T} and {Pi{y), 1 < 
i <T} such that 



f{x,y) -^ai{x)f3i{y) 



i=l 



< £ 



for all x £ X and y £Y . 



Since the two sets of functions {aj(x)} and {/?j(y)} depend only on x and y respectively, 
the above expansion is called separated. Suppose r > ^/2. For our problem, we take 



Y = B{0,r) and X = {x : e{x,£) < l/r,\x\ > r^} 



(3) 



where ^ is a given unit vector and 9{a, b) is the spanning angle between vectors a and b. The 
geometric relationship between Y and X is illustrated in Figure [T] The following theorem 
serves as the theoretical foundation of our approach. 
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Theorem 2.2. For any e > 0, there exists a number T{e) which is independent of r such 
that 

I 

l' 

has a T{e)-term e-expansion for any X and Y given by 



G{x,y)='-Hl,'\27r\x-y\ 



The representation guaranteed by Theorem 2.2 is cahed a directional separated repre- 
sentations for the obvious reason. One way to prove this theorem is to use the asymptotic 
behavior of Hq^^ for large arguments [H |5] : 



and then fohow the same path as the proof for Theorem 2.2 in |15] . 

2.1 Construction of directional separated representation 

A procedure based on random samphng has been described in [15^ for the construction 
of these directional separated representations. In the rest of this section, we propose an 
improved version which gives lower separation ranks and better accuracy based on our 
numerical experience. For a given pair Y and X that satisfy the directional parabolic 
separation condition, our new procedure takes the following steps: 

1. Sample Y randomly with a set of samples {yj, 1 < j < Ny}- In our implementation, 
we use 2 to 3 points per wavelength and the number of samples Ny grows linearly 
with the area of Y. Sample X similarly with a set of samples {xj, 1 < i < Nx}- Let 
A be the matrix defined by 

Aij = G{xi,yj) = ^H^^\2Tr\xi-yj\), 



for 1 < i < Nx and 1 < j < Ny. In the language of linear algebra. Theorem 2.2 
states that A can be factorized, within error 0(e), into the product of two matrices, 
the first containing T(e) columns and the second containing T{e) rows. 

2. Let Ai be the submatrix of A containing a set of A^i randomly selected rows. Here 
we set A^i ~ 3 • T(e) in practice. Our goal is to find a set of T(e) columns of Ai 
that has the largest T(e)-dimensional volume. Since Ai is only of size 0(T(e)) x Ny, 
one can use either the interpolative decomposition [8] or the greedy standard pivoted 
QR factorization to find these columns. Both algorithms have an 0{Ny) complexity. 
Suppose the pivoted QR factorization is used. We then have the decomposition 

AiPi = QiRi, 

where Pi is a permutation matrix, Qi is orthonormal, and Ri is upper triangular. 
Now identify the diagonal elements of i?i which are less than e and truncate the 
associated columns of Qi and rows of Ri. Denote the resulting matrices by Qi^c and 
Ri^c- Since Ai itself has an 0(T(e) )-expansion, (5i,c contains only 0(T(e)) columns 
in practice. Moreover, it is clear that 

Qi,cRi,c = Ai,c, 

where Ai^c is the submatrix containing the columns of Ai from which the matrix Qi^c 
is generated. We denote by Ac the submatrix of A that consists of the same columns. 
The 0(T(e)) samples of Y associated with these columns are denoted {bg}. 
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3. Let A2 be a submatrix of A containing a set of A''2 randomly selected columns. We 
again set N2 ~ 3 • T(e). Repeat the previous step on As a result, we have two 
matrices Q2,r and i?2,r- Q2,r is orthonormal and has 0{T(e)) columns again, while 
R2,r is upper triangular. They satisfy the relationship 



r?* /O* 



where A2^r is a submatrix containing appropriate rows of A. We denote by Ar the 
submatrix of A that consists of the same rows and by {op} the 0(T(e)) samples of 
X associated with these rows (see Figure [s]). 

4. We randomly pick a set S of Ns rows and a set T of A'y columns. In practice, we 
choose Ns and Nt to be equal to 10 • T(e). Set ^3 to be the minor containing the 
elements from rows in S and columns in T, Ac^s to be the submatrix of Ac containing 
the rows in S, and Ar^x to be the submatrix of Aj. containing the columns in T. Next, 
we choose D = {Ac^s)~^ A3{Ar^T)~^ , where ( )"'" stands for pseudoinverse. We claim 
that 



1^ 



0{e). 



Such an approximate factorization is often called a pseudoskeleton approximation of 
A in the literature (see |17| I18|). Notice that the matrix D has only 0(T(e)) rows and 
columns. Denoting the entries of D by dqp, we can rewrite the previous statement in 
the form 



G{xi,yj) - ^G(xi,6g) 



dqp ■ G{ap,yj) 



0{e) 



for all Xi and yj. 



5. Finally, since {xi} and {yj} sample the sets X and Y with a constant number of 
points per wavelength, it is reasonable to expect 



G{x, y) G{x, bq) ■ dqp ■ G{ap, y) 

PA 



0{e) 



(4) 



for any x e X D B{0, K) and y G T. 



Since both {ap} and {hq} are of order 0(T(e)), it is clear that (|4| is a low rank separated 
representation. Moreover, we only need to store {op}, {bq}, and D for (|4|, thus reducing 
the storage requirement dramatically. We would like to point out that recently there has 
been a lot of research devoted to problems similar to ^ (see [HI UHl HH |2lj for details). 

This randomized procedure performs quite well in practice as we will see from the 
numerical results in Section |4j Though we do not yet have a proof, the following heuristic 
argument provides some useful insights. In the standard pseudoskeleton approximation 
|17t [T5] , an m X n matrix A has the following approximation: 



where A^ G, and Aj. are of size m x k, k x k, and k x n respectively. Often Ac contains 
the columns of A that have the largest A;-dimensional volume and, similarly, Ar contains 
the rows with the largest /c-dimensional volume. Finding these columns and rows are 
quite expensive if both m and n are large. Suppose now that we can project the columns 
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Y 




Figure 3: Constructions of the separated representation between X and Y. {bg} are the 
samples associated with the columns in Ac (Step 2). {up} are the samples associated with 
the columns in Ar (Step 3). 

(or rows) of A onto a p dimensional subspace L which is randomly selected from all p- 
dimensional subspaces with the uniform rotational invariant probability measure. As long 
as p is adequately larger than k, the volume spanned by any set of k columns (or rows) is 
preserved to a good accuracy [lUI • Therefore, one efficient method to find the columns 
of A with the largest volume would be to 

1. project A onto a random p dimensional subspace, 

2. find the columns of the projected matrix that have the largest fe-dimensional volume, 

3. pick the corresponding columns of A to be the answer. 

The only difference between this approach and the second and third steps of our randomized 
procedure is that we only project to a random set of coordinates, which is much more 
restrictive than the uniform random projection. However, since both the columns and the 
rows of our matrix A is highly oscillatory and incoherent with the Dirac functions, our 
procedure works well in practice. 

2.2 Equivalent charges 

The directional separated representation (|4]) provides a way to represent the potential in X 
generated by the charges inside y in a compact way. Suppose that X is centered around 
the unit direction i and {/j} are the charges located at points {yi} in Y. After applying 
Q to y = for each yj and summing them up with weight /», we have 



at points {bg} in order to reproduce the potential generated by the charges {fi} located at 
points {yi}. We call the charges in ([5| the directional outgoing equivalent charges of Y in 




This states that we can place a set of charges 




(5) 
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direction i and the points {hq} the directional outgoing equivalent points of Y in direction 
L In addition, we refer to the quantities 



^G{ap,yi)fi 



(6) 



as the directional outgoing check potentials of Y in direction £ and the points {ap} as 
the directional outgoing check points of Y in direction L Given the check potentials, the 
equivalent charges can be computed easily by a multiplication with D. 

Let us now reverse the role of X and Y . Suppose we have a set of charges {/«} located 
at points {xj} in X. Since G{x,y) = G{y,x), 



This states that we can put a set of charges 



0{e) 



(7) 



at points {op} and they reproduce the potential generated by the charges {/«} located at 
points {xi}. Therefore, we call the charges in Q the directional incoming equivalent charges 
of Y in direction £ and the locations {op} the directional incoming equivalent points of Y 
in direction i. In analogy to the previous terminology. 



YGibq,Xi)fi 



(8) 



are called the directional incoming check potentials of Y in direction I and the location {bq} 
are called the directional incoming check points of Y in direction £. 



3 Algorithm Description 

Without loss of generality, we assume that the size of the domain K = 2^^ for a positive 
integer L. 



3.1 Data structure 

We start by constructing a quadtree which contains the whole computational domain. We 
often use B to denote a square in the quadtree and vu for its width. A square B is said to 
be in the low frequency regime 'd w < 1 and in the high frequency regime ii w > 1. In the 
high frequency regime of the quadtree, no adaptivity is used, i.e., every non-empty square 
is further partitioned until the width of the square is less than 1. In the low frequency 
regime, a square B is partitioned as long as the number of points in B is greater than a 
fixed constant Np. The value of Np is chosen to optimize the computational complexity 
and, in practice, we pick Np = 50. 

For a square B in the low frequency regime, its data structure follows the description 
of the kernel independent FMM in |30J. The near field is the union of the squares A 
that satisfies dist{A,B) = 0, where dist{A,B) = 'mfx£A,y£B \x — y\- The far field is 
the complement of . The interaction list contains all the squares in on S's 

level, where P is the parent square of B. 



8 



Figure 4: The far field is partitioned into wedges. From left to right, w = 1,2, 4. The radii 
are 1,4, and 16, respectively. 

• {Vk '"}^ {/^'°}) {^^'°} respectively, the outgoing equivalent points, 
equivalent charges, check points, and check potentials. 

• {y^'*}) {/^'*}) {^^'*} ^^'i {''^fc '*} respectively, the incoming equivalent points, 
equivalent charges, check points, and check potentials. 

For a square B in the high frequency region, the near field is the union of all the 
squares {A} that satisfy dist{A, B) < w"^. The far field is the complement of N^. The 
interaction list contains all the squares in N^\N^ on B^s level, where P is B^s parent 
square. Notice that the far field of a square B in the high frequency region is pushed away 
in order to be compatible with the directional parabolic separation condition. The far field 

is further partitioned into a group of directional wedges, each belonging to a cone with 
spanning angle 0{l/w). We denote the set of all the wedges of B by {H^^'^}. In Figure |4j 
we illustrate the case for for w = 1, 2, 4. 

For each square B and each direction i, we summarize the relevant quantities as follows: 

• {Uk '"'^}^ {/^'°'^}' {^^'"'^}) ^rid {u^'°'^} are the outgoing directional equwalent points, 
equivalent charges, check points and check potentials respectively. 

• {y^'*'^}) {/^'*'^}) {^^f'*'^}) s-rid {n^'*'^} are the incoming directional equiyalent points, 
equivalent charges, check points and check potentials respectively. 

3.2 Translation operators 

Following the convention in |19| [27], we name these operators M2M, L2L, and L2L trans- 
lations, though no multipole or local expansions are involved in our algorithm. The trans- 
lation operators for squares in the low frequency regime are detailed already in [30]. The 
operators in the high frequency regime are more complicated. The main reason is that the 
computations are now directional. 

For a square B in the high frequency regime, the M2M translation constructs the outgo- 
ing directional equivalent charges of B from the outgoing equivalent charges of -B's children. 
There are two cases to consider. In the first case, w = 1. The children squares have only 
nondirectional equivalent charges. The M2M translation iterates over all of the directional 
indices {f\ of B, and the steps for a fixed direction i are as follows: 



Figure 5: i? is a square with width w > 1. For any fixed £, there exists £' such that W^'^ 
is contained in W'^'^ where C is any one of B^s children. 

1. Use {JciUk'"} source points in B and Ucifk "} ^ source charges. Here the union 
is taken over all of the children squares of B. 

2. Compute {u^'°'^} at points {x^'°'^} with kernel evaluation, and then obtain {f^'°'^} 
by multiplying {u^'°'^} with the matrix D associated with the wedge W^'^. 

In the second case, w > 1. Now the children squares have directional equivalent charges 
as well. The M2M translation iterates over all of the directional indices {£} of B. The steps 
for a fixed direction £ are as follows: 

1. Pick £' , a direction associated with the squares of width w/2, such that the wedge 
l^-B/ is contained in the wedge W'^'^ where C stands for anyone of B^s children. The 
existence of £' is ensured by the way we partition (see Figure ^ . 

2. Use IJciUk'"'^ } ^ source points in B and IJcifk '"'^ i source charges. Here the 
union is taken over all the children squares of B. 

3. Compute {uf '°'^} at {x^'"'^} with kernel evaluation and then obtain {/^'°'^} by mul- 
tiplying {u^'°'^} with the matrix D associated with the wedge W^'^. 

The L2L translation constructs the incoming check potentials of i?'s children from the 
incoming directional check potentials of B. Again there are two cases to consider. In the 
first case w = 1. The children squares have only nondirectional check potentials. The L2L 
translation iterates over all of the directional indices {£} of B, and the steps for a fixed 
direction £ are as follows: 

1. Compute {/^'*'^} from {u^'*'^} by multiplying it with the appropriate D matrix. 

2. For each child C of the square B, add to {u^'*} the potentials evaluated at {a;^'*} 
using {/^'*'^} as the source charges at {y^'*'^}. 

In the second case, w > 1. Now the children squares have directional equivalent charges. 
The L2L translation iterates over all of the directional indices {£} of B. The steps for a 
fixed direction £ are as follows: 

1. Pick £' , a direction associated with the squares of width w/2, such that the wedge 
]YB/ jg contained in the wedge W*-^'^ where C stands for anyone of i?'s children. 

2. Compute {/^'*'^} from {u^'^'^} by multiplying it with the appropriate D matrix. 
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Figure 6: A small part of the quadtree used in the computation. Each rectangular region 
stands for a square of the quadtree. The diagram shows how the outgoing nondirectionai 
equivalent charges from a leaf square have been transformed into incoming nondirectionai 
check potentials at other leaf squares. Far field interaction involves directional computation 
in the high frequency regime. 

3. For each child C of the square B, add to {u^'^'^ } the potentials evaluated at {x^'^'^ } 
using {/^'''^} as the source charges at {y^'*'^}. 

Finally, the M2L translation is applied to pairs of squares A and B on the same level 
of the quadtree. They need to be on each other's interaction lists. Suppose B falls into the 
wedge W^'^ of A while A falls into the wedge W^'^ of B. The implementation of the M2L 
translation contains only one step: 

1. Add to {u^'^'^ } the potentials evaluated at {x^'*'^ } using the charges {/^'°'^} at 
points 

To summarize the discussion on the transition operators, we would like to emphasize 
that all of these operators involve only kernel evaluation and matrix-vector multiplication 
with precomputed matrices. Therefore, they are simple to implement and highly efficient. 

3.3 Algorithm 

Now we are ready to give the overall structure of our new algorithm. It has exactly the 
same structure as the 3D algorithm in [15] and we simply reproduce it here: 

1. Construct the quadtree. In the high frequency regime, the squares are partitioned 
uniformly. In the low frequency regime, a leaf square contains at most Np points. 

2. Travel up in the quadtree and visit the squares in the low frequency regime. These 
squares have width less than 1. For each square i?, compute its outgoing nondirec- 



11 



tional equivalent charges {f^'°}- This is done using the low frequency nondirectional 
M2M translation. 

3. Travel up in the quadtree and visit the squares in the high frequency regime. For 
every such square B, use the high frequency directional M2M translation to compute 
the outgoing directional equivalent charges {/^'°'^} for each outgoing direction i. We 
skip the squares with width greater than ^/K since their interaction lists are empty. 

4. Travel down in the quadtree and visit the squares in the high frequency regime. For 
every such square B and for each direction i, perform the following two steps: 

(a) Transform the outgoing directional equivalent charges {/^'°'^} of all of the squares 
{A} in B's interaction list and in direction i via the high frequency directional 
M2L translation. Next, add the result to the incoming directional check poten- 
tials {u^'^'^}- 

(b) Perform the high-frequency directional L2L translation to transform {u^'*'^} to 
the incoming check potentials for i?'s children. 

Again, we skip the squares with width greater than ^/K. 

5. Travel down in the quadtree. For every square B in the low frequency regime, we 
perform the following two steps: 

(a) Transform the outgoing nondirectional equivalent charges {f^'"} of all of the 
squares {A} in S's interaction list via the low frequency nondirectional M2L 
operator. Next, add the result to the incoming nondirectional check potentials 

(b) Perform the low frequency directional L2L translation. Depending on whether 

is a leaf square or not, add the result to the incoming check potentials of S's 
children or to the potentials at the original points inside B. 

An illustration of the various components of the algorithm is given in Figure |6] The 
following theorem summarizes the complexity of the proposed algorithm. 

Theorem 3.1. Let S be a rectifiahle curve in 5(0,1/2). Suppose that for a fixed K the 
points {pi, 1 <i < N} are samples of KE, where N = 0{K) and KE = {K ■ p,p G S} (the 
surface obtained by magnifying S by a factor of K). Then, for any prescribed accuracy, the 
proposed algorithm has a computational complexity 0{K\ogK) = 0{N log N). 

The proof of this theorem follows closely the steps of Theorem 4.1 of [15j. The main 
step of the proof is the observation that, for any fixed w > 1, there are at most 0{K/w) 
squares of size w and, for each of them, there are at most 0{w) squares for which we apply 
the M2L operator. 

4 Numerical Results 

In this section, we provide some numerical results to illustrate the properties of our new 
algorithm. All of the computational results below are obtained on a desktop computer with 
a 2.8 GHz CPU. 

Let us first study the performance of the randomized procedure presented in Section [2] 
In Table [T] we list the number of terms in the separated representation for two sets X and 
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w = 1 


u; = 2 


w = 4 


u; = 8 


w = 16 


w = 32 


w = 64 


w = 128 


e=le-4 


14 


11 


11 


10 


9 


9 


9 


9 


e=le-6 


19 


16 


14 


13 


12 


12 


12 


11 


e=le-8 


27 


20 


16 


15 


15 


15 


14 


14 



Table 1: The separation rank of the directional separated representation for different choices 
of requested accuracy e and square size w. 

Y for different choices of accuracy e and square width w. Here r, the radius of Y, is set to 
be ^/2w so that the square of width w is contained in Y. We can see from Table [l] that the 
separation rank is bounded by a constant which is independent of the values of w. This is 
consistent with our theoretical estimate in Theorem |2.2[ In fact, as w grows, it seems that 
the separation rank decays slightly. 

Next, we applied our algorithm to the A^-body problems on several objects. In our 
experiments, the boundary of each object is represented by a piecewise smooth curve. For 
these tests, the point set {pi} is generated by sampling the curve randomly with about 
20 points per wavelength. The densities {/,} are generated from a random distribution 
with mean 0. We use {uj} to denote the true discrete potentials and {u"} to denote the 
approximations obtained through our algorithm. We estimate the relative error by picking 
a set S of 200 points from {pi}. The true potentials {ui,i S S} are computed by using 
direct evaluation. The error is then estimated to be 

Before reporting the results, let us summarize the notations we use here: N is the 
number of points, K is the size of the problem in terms of the wavelength, e is the prescribed 
error threshold such that the final error is to be bounded by a constant multiple of e, Tq is 
the running time of our algorithm in seconds, is the running time of the direct evaluation 
in seconds, T^/Ta is the speedup factor, and Sa is the resulting error of our algorithm. 

The first example is a circle and the results are summarized in Table [2| The second 
example is an airfoil and the results are shown in Table |3| The final example is a kite-shaped 
object and we report the numbers in Table |4j These numbers demonstrate clearly that our 
algorithm scales exactly like 0(A^log A^) in terms of the number of points. Furthermore, 
the error seems to grow only slightly as we increase the number of points. 

Compared with the results presented in [7], our algorithm is slower by a factor of 8. 
The reason is that we heavily use the kernel evaluation formula in our algorithm. The 2D 
Helmholtz kernel involves the Hankel functions and the current computational procedure 
for their evaluation is rather slow. On the other hand, all of the high frequency translations 
in [7] are precomputed and stored in the diagonal form and no special function evaluation 
is required during the computation. 

Finally, we apply our algorithm to the solution of the BIE formulation 

of the 2D scattering problem mentioned in Section [T} Here, we report the numerical results 
for the smooth objects in Tables |2] and |4j In our experiments, we use a uniform discretization 



13 




{K,e) 


N 


Ta (sec) 


Trf(sec) 


Td/Ta 


ea 


(2048,le- 


4) 


1.13e+5 


3.40e+l 


8.05e+3 


2.37e+2 


1.25e-4 


(8192,le- 


4) 


4.50e+5 


1.56e+2 


1.28e+5 


8.21e+2 


1.31e-4 


(32768,le 


-4) 


1.80e+6 


7.07e+2 


2.06e+6 


2.91e+3 


1.80e-4 


(2048,le- 


6) 


1.13e+5 


5.30e+l 


8.00e+3 


1.51e+2 


7.88e-7 


(8192,le- 


6) 


4.50e+5 


2.39e+2 


1.28e+5 


5.37e+2 


9.98e-7 


(32768,le 


-6) 


1.80e+6 


1.08e+3 


2.06e+6 


1.91e+3 


l.OOe-6 


(2048,le- 


8) 


1.13e+5 


8.20e+l 


8.05e+3 


9.82e+l 


8.48e-9 


(8192,le- 


8) 


4.50e+5 


3.57e+2 


1.29e+5 


3.60e+2 


1.18e-8 


(32768,le 


-8) 


1.80e+6 


1.58e+3 


2.07e+6 


1.31e+3 


1.30e-8 



Table 2: Results of a circle with the Helmholtz kernel. is the number of points, K is 
the size of the problem in terms of the wavelength, e is the prescribed error threshold such 
that the final error is to be bounded by a constant multiple of e, Ta is the running time of 
our algorithm in seconds, is the running time of the direct evaluation in seconds, T^/Ta 
is the speedup factor, and Sa is the estimated error of our algorithm. 

of about 20 points per wavelength. We pick ij = tt and set the incoming field u*"'^(x) to 
be e^'^*^'''^ with d = (1,0). We discretize the integral equation with the Nystrom method 
|9l |22] and use the endpoint-corrected trapezoidal rules from [21] to integrate the weakly 
singular part of the integral. The system is solved iteratively using the GMRES algorithm 
and the restarted number is set to be 80. Within each iteration of the GMRES solver, the 
application of the integral operator is accelerated using our multidirectional algorithm with 
£ =le-4. Table [5] summarizes the results for the circle with wavelengths from 1024 to 8192. 
Here Tj is the averaged time of each iteration, Ni is the number of iterations, and Tf is the 
total time. Table |6] reports the results of the kite-shaped object in Table |4| In Figure [7] 
we display the scattering field of the kite-shaped object in a region with caustics. 

5 Conclusions 

In this paper, we described a directional multiscale algorithm for computing the A^-body 
problem for the high frequency Helmholtz kernel in two dimensions. The approach follows 
the framework described in [15j. Our algorithm is accurate and works well for problems 
in all scales. By using the directional low rank representations for regions that follow the 
directional parabolic separation condition, our algorithm achieves the optimal O(A^logA^) 
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{K,e) 


N 


Ta (sec) 


Trf(sec) 


Td/Ta 


ea 


(2048,le- 


4) 


7.82e+4 


2.00e+l 


3.87e+3 


1.94e+2 


1.15e-4 


(8192,le- 


4) 


3.13e+5 


8.80e+l 


6.17e+4 


7.02e+2 


1.21e-4 


(32768,le 


-4) 


1.25e+6 


3.90e+2 


9.90e+5 


2.54e+3 


1.07e-4 


(2048,le- 


6) 


7.82e+4 


3.20e+l 


3.87e+3 


1.21e+2 


1.04e-6 


(8192,le- 


6) 


3.13e+5 


1.38e+2 


6.20e+4 


4.50e+2 


9.65e-7 


(32768,le 


-6) 


1.25e+6 


6.05e+2 


l.Ole+6 


1.67e+3 


1.20e-6 


(2048,le- 


8) 


7.82e+4 


4.70e+l 


3.87e+3 


8.24e+l 


8.58e-9 


(8192,le- 


8) 


3.13e+5 


2.03e+2 


6.22e+4 


3.06e+2 


1.69e-8 


(32768,le 


-8) 


1.25e+6 


8.78e+2 


9.95e+5 


1.13e+3 


1.33e-8 



Table 3: Results of an airfoil with the Helmholtz kernel. 



complexity. A new and more efficient randomized technique compared to the one in |15] 
has also been introduced for the construction of the low rank separated representations. 
The numerical results have shown that our algorithm is capable of addressing very large 
scale problems in high frequency scattering. 

For future work, we would like to have a rigorous proof for the randomized procedure 
proposed in Section [2] Another interesting direction for future research is to apply this 
kind of directional multiscale idea to other problems with oscillatory behavior, in both two 
and three dimensions. One typical example is the computation of the far field pattern of a 
scattering field [9l|29]. 
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discussions. B.E. is partially supported by an NSF grant DMS 0714612 and a startup grant 
from the University of Texas at Austin. L.Y. is partially supported by an Alfred P. Sloan 
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